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Abstract 

The psycho-visual perception of distorted natural scenes and the structured nature of video data 
motivate introducing content-aware decisions that utilize this structure for improved video transmission 
over wireless networks. In this paper, we introduce an architecture for real-time video transmission 
over multiple-input multiple-output (MIMO) wireless communication systems using loss visibility side 
information. The perceptual importance of a packet is quantified through the packet loss visibility 

and the video content is characterized by the loss visibility distribution which we propose to estimate 

> ■ 

■<^J- ■ online. To jointly capture video quality and network throughput, we define the optimization objective 



as the throughput weighted by the loss visibility of each packet, a metric coined perceived throughput. 
Applied to a MIMO physical layer, we use the loss visibility side information to classify video packets 



, and transmit them through different subchannels of the MIMO channel using a thresholding policy. We 

derive the optimal thresholding policy as a function of the loss visibility distribution. The globally- 
optimal policy provides a load balancing policy that equates the throughput among streams. We further 
prove that the proposed architecture enables a multiplicative quality gain due to packet prioritization 
and a throughput gain due to the load balancing property of the thresholding policy. Results show that 
the composite quality and throughput gains are significant under full channel state information as well 
as limited feedback. Tested on H.264-encoded video sequences, the proposed architecture achieves the 
same video quality at 17 dB reduction in transmit power for a 2x2 MIMO system. We also demonstrate 
gains in the excess of 10 dB for a range of antenna configurations. 
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I. Introduction 

Improving the perceptual quality associated with video streaming over lossy networks requires 
developing advanced video-aware transmission techniques that adapt using knowledge of the 
expected user response to video distortions and losses. The response to video packet losses and 
distortions is inherently unequal due to the structural dependencies in the encoded video data 
and the psycho-visual perception of natural scenes with varying temporal and spatial character- 
istics. This motivates introducing perceptually-driven optimizations at the edge of the network. 
Advanced PHY layer designs, such as multiple-input multiple-output (MIMO) processing, can 
be inexpensively used to realize these optimizations by providing unequal error protection and 
prioritized delivery. Furthermore, MIMO systems are an integral part of state-of-the-art wireless 
standards such as 3GPP Long Term Evolution (LTE) and IEEE 802.1 In, which deliver the bulk 
of stored and real-time video traffic. 

Video packet loss visibility, defined as the probability that the artifact due to the loss of a given 
packet is visible to the average user, is one choice of side information for the MAC and PHY 
layers to use in making video-aware decisions. The objective of loss visibility modeling and esti- 
mation (e.g. HI, CD) is to find the model that best correlates the loss visibility estimate with the 
results reported by viewers through subjective tests, thus naturally capturing the user perception. 
Consequently, incorporating loss visibility into network-level and link-level adaptation enables 
perceptual optimization. Furthermore, loss visibility side information is inexpensive to transmit 
to the network edge by quantizing and embedding it into the packet headers. The loss visibility 
of different video slices or packets exhibit high variability because of the inherent features of 
state-of-the-art video codecs (e.g. 0, flU) such as inter- frame coding, motion compensation, and 
error concealment. Inter-frame coding introduces packet dependencies in the temporal domain, 
thus causing different error propagation patterns, and increasing the loss visibility variability. 
Furthermore, the non-uniform motion across different spatial locations causes loss visibility to 
be unequal among slices and dependent on the error concealment method. 

A. Paper Contributions 

1 ) Loss Visibility Optimized Video Transmission: We propose a new cross-layer architecture 
for real-time video transmission over MIMO systems that uses loss visibility side information. 
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Our approach uses the loss visibility scores to estimate the loss visibility distribution and char- 
acterize the content. At the PHY/MAC layers, we use the loss visibility values to classify video 
packets; each class of packets is transmitted through a different spatial stream corresponding to a 
decomposed subchannel of the MIMO channel. The packet classes are identified through a loss 
visibility based thresholding policy. To optimize the selection of the loss visibility thresholds, we 
define a perceptually-aware metric called perceived throughput. It generalizes the conventional 
notion of throughput by weighting each packet by its loss visibility. The perceptual relevance 
of this metric arises from the fact that the loss visibility values reflect the visual perception 
of a corresponding packet loss. Given any continuous loss visibility distribution, we derive the 
globally-optimal transmission policy and show that it provides a load balancing solution that maps 
packets to streams to equate the throughput among streams. In comparison with conventional 
MIMO signaling whereby the symbols from each packet are mapped to all spatial streams, the 
solution provides a load balancing gain equal to the achievable throughput averaged over streams 
divided by the achievable throughput on the worst stream. This translates into a throughput gain, 
specifically in the low to medium SNR regime. 

2) Prioritized Video Packet Delivery: In addition to the throughput gain due to the optimal 
load balancing solution, we show that a video quality gain is achieved due to sending the 
more visible packets through the more reliable spatial streams. This is referred to as the packet 
prioritization gain. We evaluate this gain by estimating packet loss visibility for H.264 encoded 
video sequences and testing the loss visibility based prioritization policy for several MIMO 
antenna configurations. For a 2 x 2 MIMO systems with 2 spatial streams, a target video quality 
of 0.9 on the MS-SSIM scale requires E s /N = 3 dB with prioritization versus E s /N = 20 
dB without prioritization. The 17 dB gain is achieved by splitting the video packets into only 
two priority classes. We further demonstrate gains in excess of 10 dB for several configurations 
over a large range of SNRs. 

3) Mode Selection and Content-aware Adaptation under Full and Limited Feedback: We 
propose selecting the MIMO mode corresponding to the number of spatial substreams in a 
manner that captures video quality as well as throughput maximization. If the loss visibility 
distribution characterizes a source with high variability, a higher mode is preferable to provide 
prioritized delivery by adding more packets classes under good channel conditions. Conversely, 
if the variability in packet importance is low, then the contribution of packet prioritization is 
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minimal and reliable delivery with a smaller number of spatial streams may be preferred. Thus, 
our proposed approach adapts mode selection according to the video source in a content-aware 
manner. Content-aware adaptation is also enabled through the dependence of the transmission 
policies on the loss visibility distribution that characterizes the video content type. Our proposed 
solutions are also extended to codebook-based limited feedback systems where the channel state 
information is quantized at the receiver and sent back to the transmitter over a feedback link. We 
show that a similar load balancing policy is optimal under limited feedback but the corresponding 
load balancing gain drops because the unequal stream quality cannot be fully utilized due to 
channel quantization error. 

B. Related Work 

We review related work on loss visibility estimation and modeling, loss visibility based 
optimization, and adaptive MIMO transmission for video content. In [fl, a generalized linear 
model is proposed for video packet loss visibility modeling considering factors within a packet 
and its vicinity to capture the temporal and spatial distortions. The bag of features used to 
estimate loss visibility is versatile by being applicable over a range of encoding standards, GoP 
structures, and error concealment methods. Some features such as motion magnitude, motion 
variance, distance from scene cut, and camera motion capture the video source properties. Other 
features such as initial structural similarity index (SSIM), maximum per-macroblock (MB) mean 
square error (MSE), and spatial extent capture the distortions caused by the loss in spatial domain. 
Temporal error propagation is also captured through features related to the number of frames 
affected by the loss, distance to reference frame, error concealment method, and other scene 
loss concealment. The generalized linear model using these features is fit based on subjective 
tests. Other related loss visibility modeling approaches can be found in (21 and [5j. Besides 
the regression-based methods, fl2l proposes a classification-based approach using a statistical 
tool called classification and regression trees (CART) to classify each packet loss as visible or 
invisible. The proposed loss visibility model is applied in [6]| for selecting unequal coding rates 
for different slices and for resource allocation in an OFDM system. 

In this work, we propose a generic framework that allows using loss visibility models to 
optimize transmission policies at the PHY and MAC protocol layers. Specifically, we apply 
the generalized linear modeling approach in [fl] for loss visibility estimation of H.264-encoded 
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sequences due to its versatility and high classification accuracy. We further argue that the loss 
visibility distribution is a natural indicator of the video content characteristics, and thus, we 
propose to inexpensively estimate and update the distribution using non-parametric learning, and 
subsequently use it for content-aware adaptation. 

For scalable video sequences, the loss visibility varies significantly across temporal, spatial, 
and quality layers. Estimating the average loss visibility of packets from each scalable video 
layer is addressed in [E)-[|9]. Online learning is used to specify the maximum fraction of packet 
losses from each layer to meet a target video quality. The online algorithm uses local linear 
regression to estimate the video quality loss due to packet losses from a specific video layer. 
Based on the ACK history information, the local linear regression fit is updated and the unequal 
protection levels are estimated continuously over time. Adjusting the learning window provides 
a bias/variance tradeoff whereby a large learning window provides a more factual estimation of 
loss visibility while a smaller learning window enables finer adaptation to the changing video 
temporal characteristics. 

While loss visibility-based adaptation approaches are not heavily investigated in the litera- 
ture, other adaptive video transmission techniques such as joint source-channel coding (JSCC) 
[|9l- lfT2l . unequal error protection (UEP) [|7], [fl3l . [fl4l . and prioritized scheduling [fj"5l . and 
distortion-aware resource allocation [U~6l . [fPTI have been proposed to increase video quality and 
error resilience. Previous work, however, does not present a generic framework for incorporating 
loss visibility-based decisions into wireless networks. To the best of our knowledge, this is the 
first comprehensive work that defines a generic cross-layer design for using loss visibility in 
wireless networks, develops MEMO transmission strategies for prioritized delay- sensitive video 
delivery, and derives corresponding closed-form gain expressions. 

Adaptive MIMO transmission for video content has been investigated in |[T8ll - ll23ll . In lfT8l . a 
cross-layer framework for MIMO video broadcast is proposed by allocating scalable video layers 
to the end-users jointly with precoder computation to ensure that delay and buffer constraints 
are met. In [Q93, a layered video transmission scheme over MIMO is proposed. It periodically 
switches each bit stream among multiple antennas to match the ordering of subchannel SNRs, 
thus providing prioritized delivery. In [20], a method is proposed to adaptively control the 
diversity and multiplexing gain of a MIMO system to minimize the cumulative video distortion 
and satisfy delay constraints. Finally, in |2T|. distortion-aware MIMO link adaptation techniques 



6 



are proposed for MCS and MIMO mode selection. Since [fT8l , [[191 are only applicable to 
scalable video coded bitstreams, the application scope of the proposed techniques is limited 
as the majority of current video content is non-scalable. Furthermore, [|20l , [BTI relies on rate- 
distortion information which is typically not available for real-time encoded or transcoded video. 

C. Paper Organization and Notation 

The rest of the paper is organized as follows: We present the MIMO system model and the 
loss visibility-based model in Section [III In Section [nil we define the framework for perceptual 
optimization using loss visibility. In Section [TV] we derive the optimal thresholding policy and 
present the loss visibility optimized MIMO transmission algorithm. In Section |Vj we derive 
the corresponding quality and throughput gains. We present results and analysis using encoded 
video sequences in Section [VI] to quantify the achievable gains. Finally, concluding remarks 
are provided in Section IVIII Throughout this paper, the following notation is used: A is a set, 
A is a matrix; a is a vector; and a is a scalar. The probability density function (PDF) and 
the cumulative distribution function (CDF) of random variable A are denoted /a(-) an d Fa{-) 
respectively. Its expectation is denoted by [.]. We use random variables to characterize the 
channel variation, determined by the channel matrix, as well as the source variation, determined 
by the loss visibility values. Other notation is defined when needed. 

II. System Overview 

This section introduces the MIMO system model, the loss visibility modeling approach, and 
the proposed cross-layer design that enables loss visibility based optimization. 

A. PHY Layer: MIMO System Model 

Consider a narrowband MIMO wireless system with N t transmit antennas and N r receive 
antennas. The system uses S spatial streams where S < min(iVt, iV r ) and each stream corresponds 
to a stream of constellation symbols. In our MIMO system, the size of the symbols and the code 
rate may vary per substream, as well as the number of substreams, known as mode adaptation. 
Thus, we have 1 < S < mm(N t ,N r ). Linear precoding enables mapping a symbol vector s 
from each spatial stream to an N t -dimensional spatial signal using an N t x S linear precoding 
matrix F s . The spatial signal encounters a channel matrix H and an additive noise vector n. 
The corresponding input-output relationship is 
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where n are each distributed according to CJ\f(0,N ). The matrix HF5 can be thought of as 
an effective channel. The receiver decodes y using this effective channel and a zero forcing 
receiver. We assume a block-fading model whereby the channel realization H is fixed over a 
set of packets V and then independently takes a new realization. All the transmission decisions 
are adapted every channel coherence time which could be as small as one packet duration, i.e., 
\V\ > 1, thus being applicable over a range of mobility scenarios. For a zero forcing receiver, 
it is shown in [|24| that the SNR on the i th stream is 

E s 1 

7 * (H) = ivt[F*H*HF 5 ]-- (2) 
We consider both cases of perfect and imperfect transmitter channel state information (CSIT). In 
both scenarios, we assume that the feedback delay is negligible and the transmitter and receiver 
are fully synchronized. With perfect CSIT, the MIMO channel can be converted to parallel, 
noninterfering single-input single-output (SISO) channels through a singular value decomposition 
(SVD) of the channel matrix ll25l . We consider unitary precoding whereby the columns of F s 
are restricted to be orthogonal. While this could be further generalized to a non-unitary power 
constraint, we note that using the unitary constraint along with multimode precoding results in 
near optimal capacity performance [26]. Thus, we create F s from a normalized version of the 
right singular vectors of H as follows 

[V],i,s (3) 



where H = USV* is the singular value decomposition of H. Under the precoding structure in 
(3), the SNR for the i th stream simplifies to 

where <7j is the i th singular value of H. For quantized CSIT, the receiver chooses a precoding 
matrix F s from a codebook F s consisting of a finite set of precoding matrices. There are 
log 2 (|J r s|) = B s bits of feedback used to convey the index of the chosen precoding matrix back 
to the transmitter if S spatial streams is used. For simulations, the codebook F$ is designed 
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using Grassmannian subspace packing with the chordal subspace distance measure as described 
in E71 . The criterion for selecting the precoder at the receiver is to maximize the minimum 
singular value, that is, F s = argmax Fgjr A min (HF). 

B. APP Layer: Loss Visibility Estimation 

The objective of loss visibility estimation is to associate a packet p with a value v[p] ranging 
from to 1 and indicating the loss visibility of the packet. A value v[p] = indicates that losing 
packet p does not have a visible impact on the end video quality whereas a value v[p] = 1 
indicates that the loss of packet p will certainly be visible. 

In this work, we apply loss visibility estimation at the APP layer based on the generalized 
linear model (GLM) approach proposed in |UJ. To estimate loss visibility, we extract video 
features both from the raw video reference as well as the encoded bitstream. A video frame is 
divided into a set of slices, each corresponding to horizontal group of MBs. We apply forward 
motion estimation to each MB to estimate the motion magnitude for each MB and compute the 
slice motion magnitude as the average per-MB motion magnitude. The residual energy for each 
MB is computed from the corresponding motion-compensated residual signal. By thresholding 
the average motion in the entire video frame, we detect if the scene consists of a still background 
or if there is camera motion. In addition to these features, we extract features from the encoded 
bitstream. Specifically, based on the frame type and the inter-frame prediction settings, we flag 
each packet as affecting one or multiple frames. To capture spatial-domain distortions, we further 
compute the initial SSIM feature corresponding to the SSIM in the frame affected by the loss, 
and max initial mean square error (IMSE) representing maximum per-MB MSE in the same 
frame. For videos sequences with multiple scenes, we detect scene cuts and use that to flag 
packets concealed using a reference corresponding a previous scene for which losses are more 
visible. We also flag packets before scene cuts for which losses will be barely visible. While 
other features are defined in [jT], subjective tests show that only the ones mentioned above have 
high (positive or negative) correlation with loss visibility as reported by viewers. Using all these 
features, we use the following logistic regression model for loss visibility estimation 




(5) 
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Fig. 1. System block diagram. 



where f3 = {/3 ,/3i, ■ ■ • , Pf} are the intercept and the coefficients associated with the different 
features. We use the coefficients as reported in Table IV in [QQ|. We assume the loss visibility v\p] 
of packet p is communicated to the physical layer through the packet header and deep packet 
inspection can be performed at the network edge to extract the loss visibility. 

C. Loss Visibility-based Cross-layer Design 

The proposed cross-layer system block diagram including loss visibility estimation, and loss 
visibility-based optimization is shown in Figure CD We propose to divide the incoming packets 
into S classes based on their loss visibility using a thresholding-based policy. Under this policy, 
the highest priority packets are transmitted through the most reliable spatial stream. We define 
a vector of thresholds v = {w«}*=f +1 where < Vi < v-i+i < 1 and v\ — and vs+i = 1 by 
definition. At the PHY layer, the packet prioritization demultiplexer transmits a packet p through 
stream i if Uj < v[p] < The packets are queued ahead of each transmit chain to absorb any 
mismatch between the instantaneous source and transmission rates. 

D. Modulation, Coding, and Retransmission 

We apply unequal modulation per stream. The data through stream i is modulated with a 
QAM constellation of size M, G Ai. Each constellation is normalized such that the average 
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symbol energy is unity. For a given channel realization, the vector of modulation schemes is 
denoted M = {Mj}-=f . The set of channel coding rates is C and the data through all streams 
are coded with coding rate C E C. 

The probability of packet error through stream i conditioning on the modulation scheme M i5 
the coding rate C, and the i th post-processing SNR 7i(H) is denoted pi = p(Mi, C, 7j(H)). The 
uncoded M-QAM error probability expressions p un coded(^, 7) are provided in the literature (e.g. 
11281 ). Given a set of channel codes C, we estimate the coding gain of each particular code as 
follows. The PER waterfall curve for each MCS p(Mi,C,ji(H)) is estimated through Monte- 
Carlo simulations. Then, the estimated coding gain is the value g(C) that provides the best fit with 
the translated uncoded expressions, i.e. g(C) = argmin || p(M, C, 7)) — p U ncodcd(^f, 7 + <?(C)) || 
where 7 is a representative vector of SNR values. It follows that the coded PER expression can 
be approximated as p(Mj, C, 7i(H)) ps p uncodcd (Mj, 7i(H) + g(C)) for each coding rate. 

Retransmission with a finite retransmission limit is applied in the system to enable high 
reliability. Given a retransmission limit of r retransmissions, determined by the MAC protocol, 
the number of retransmissions follows a truncated geometric distribution assuming the channel 
is fixed during retransmission. Thus, the mean number of transmissions through stream i is 

* = xSa-P^r + tr+i^+^^f^ (6) 

since (1— Pi)p\~ x is the probability of success in k transmissions and p[ +1 is the post-retransmission 
failure. We define the post-retransmission probability of successful packet delivery through stream 

% as 

Pi = i-p[ +1 - (V) 

To ensure prioritized delivery, we define the best stream as the most reliable stream for packet 
delivery. Thus, streams are ordered by the post-retransmission success probability, i.e., pf < p* +1 . 
Note that this ordering captures the effect of modulation, coding, retransmission, and channel 
state because pf is a function of Mi, C, r, and 7(H). In fact, this represents a generalization of 
SNR ordering to the case of unequal modulations. 
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III. A Framework for Loss Visibility-based Perceptual Optimization 

In this section, we present a framework for using loss visibility side information to characterize 
the video content by estimating the loss visibility distribution. We further propose an optimization 
metric that uses loss visibility to jointly maximize video quality and network throughput. 

We propose characterizing video source variability at two different timescales as follows: 

1) Short-term variations: The short-term variations in video traffic characteristics are driven 
by inter-frame coding and unequal motion that cause unequal packet importance in the 
temporal and spatial domain. We capture this variation using the loss visibility scores 
which drive fine packet-level adaptations such as unequal error protection and packet 
prioritization. 

2) Long-term variations: The long-term variations at the scene level are driven by scene 
changes, camera motion, etc. We capture the variations in the scene characteristics us- 
ing the loss visibility distribution which is used to drive coarse scene-level adaptations 
such as content-aware adaptation. We estimate and update the loss visibility distribution 
inexpensively using non-parametric learning. 

A. Loss Visibility Distribution Estimation 

We propose to estimate the loss visibility distribution using kernel density estimation (KDE) 
[29], update it using the values of incoming packets, and use it to derive the optimal transmission 
policy. With KDE, the density estimate at < x < 1, denoted by f v (x), is as follows 

i=l i=l ' ' 

where W is the window corresponding to the number of packets over which the estimate is 
obtained and Kh(-) is a kernel with smoothing parameter h > 0. The distribution is inexpensive to 
compute and update as it only consists of a linear operations. Using the loss visibility distribution 
f v (v) as defined in © provides the following advantages: 

1) Since for real-time video, large buffers are not available, the buffered packet values are 
not representative of the loss visibility variability. Thus, the loss visibility distribution is 
used instead to capture this variability and provide a notion of relative packet importance. 
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2) Content-aware adaptation is enabled by capturing long-term variations in video traffic 
characteristics. Thus, optimized PHY/MAC policies can be developed for that purpose. 

3) Adjusting the kernel density estimation window W and smoothing parameter h provides a 
bias/variance tradeoff between factual estimation of the loss visibility and fine adaptation 
to changing video characteristics. 

B. A Perceptually-Optimized Throughput Metric: Perceived Throughput 

To capture video quality at the long and short timescales in PHY/MAC optimizations, we 
propose generalizing the conventional notion of throughput by weighting each packet by its loss 
visibility. The weighted throughput metric is coined perceived throughput and represents the 
total perceptual value of packets transmitted per unit time. Under this objective, if packet p is 
successfully received, it contributes a value v[p] to the user perception. Gains in perceived 
throughput correspond to composite gains in perceived video quality and throughput. Most 
generally, the perceived throughput expression is 

pT SLoP( v ) v f»(v)dv (9) 
t 

where f v (v) is the loss visibility distribution, p s (v) is the probability that a packet with loss 
visibility v is successfully delivered (after potential retransmission), and the t is the time to 
transmit 1/dv packets. The dependence of the success probability on the packet values is intended 
to capture general unequal error protection policies. For the MEMO system with the thresholding 
policy presented in §II-Cl the expression is as follows 

PT = ^ i=lPi Vi (10) 
maxj ti 

where Vi = f~ l+1 vf v (v)dv is the cumulative value of packets transmitted through stream i, pf is 
the probability of post-retransmission successful packet delivery defined in ©, and is the mean 
time to transmit the i th class packets through stream i averaged over packets in a single channel 
coherence time. Since for real-time video delivery, the packets through all streams should be 
delivered by a certain deadline, the transmission time is the worst case among all streams. The 
time to transmit a packet through stream i is b[p](l — p T i +1 )/(CB log 2 Mj(l — pi)) where b[p] is 
the size of packet p. Taking the expectation over class i packets, we obtain 
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TABLE I 

Commonly used notation 



N t 
N r 
S 


Number of transmit antennas 
Number of receive antennas 
Number of spatial streams 


fv{v) 

Vi 


Packet loss visibility distribution 

Cumulative loss visibility values of class i packets (i.e., transmitted through the i th stream) 
Vector of loss visibility thresholds where w, is the threshold between stream i and i — 1 


7i(H) 

U 

M = {Mi}tZf, Mi € X 
C G C 

Pi = p(M i ,c, Ji (a)) 

pi = i - ?r 1 


Post-processing SNR on i th stream 
Mean time to transmit a class i packet 
Vector of modulation schemes per stream 
Coding rate 
Packet error rate 

Post-retransmission probability of successful packet delivery 

Average number of retransmissions for packets transmitted through stream i 



E 



b\p](l-p. 



[(CBlogzMiil-pi)) 
E[b] 



-(F v (v 



i+l, 



(F v (v i+1 ) -F v (vi)) 
- F v {vi)) 



CB log 2 Mi(l-p 
where E[6] is the mean packet size. Thus, the perceived throughput is 



(11) 



PT(v, M, C, S) 



E[6] max i {(F w (T) i+1 ) - F u (^))(l - p[ +1 )/5C7 log 2 M 4 (l - Pl )} 
BC \og 2 M-{1- p~) 



E[6] 



Thi'oughput component s« 



(12) 
(13) 



Quality component 

where z = argmax i {(F„({)j + i) — F v (vi))(l — p r i +1 )/BC\og 2 Mj(l — Pi)} denotes the stream with 
the longest average transmission time. 



IV. Loss Visibility-based Thresholding Policy 

In this section, the loss visibility-aware video transmission problem is formulated and the 
optimal thresholding policy is derived. 
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A. Problem Formulation 

We propose to solve the problem 

maxv,M,c,s PT(v,M,C,S) (14) 
s.t. < ^ < v i+1 < 1 Vi = 1,-- • ,3 (15) 
Mi Vi = l,-- - ,3; C eC. (16) 

The objective is to select the number of packet classes S, the corresponding set of thresholds v 
and modulation orders M, and the coding rate C such that the perceived throughput is maximized. 

B. The Optimal Thresholding Policy: A Load Balancing Solution 

We derive the optimal thresholding policy v* given any continuous loss visibility distribution. 
The gradient of PT(v, M, C, S) with respect to Vi in the interval [0,1] is dPT/dvi = (hdg/dvi — 
gdh/dv^/h 2 where g = [^ti (1 " Pl +1 ) vf v (v)dv] and h = E[b](F v (v~ i+1 ) - F v (v-))(1 - 
p-. +1 )/BC log 2 Mj(l — pi) are the numerator and denominator of (fTZl i. The components of the 
gradient are 

^ = {p\ +1 -ptlW v {vi) (17) 

where we used the fact that d{j^ 1 vf v (v)dv)/dv 1 = \im € ^. (f^ +e vf v (v)dv/e) = vif v (vi). 
Furthermore, the gradient corresponding to h is 

E[b]f v (vi)(l - ptl)/{BC\og 2 Mi-^l - Pi _ x )) if i = i + l 

dh 

Wi = l -E[6]/,(^)(l-^ +1 )/(^Clog 2 Mi(l-^)) if i = i (18) 
otherwise. 
In Lemma 1 and Lemma 2, we derive properties of the gradient dPT/dv,i that will be used to 
find the thresholds f), that maximize the perceived throughput expression in Theorem 1. 

Lemma 1. If the streams are ordered by the post-retransmission success probability, i.e., p* < 
Vi = 2, ■ • • , N s , then the gradient dPT/ddi satisfies the following properties: 

1 ) OPT j di'i > where i = argmax tj 

2) dPT/dvi < Vz ^ i 

Proof: See Appendix A. ■ 
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We use Lemma 1 to derive a more general condition on the behavior of the gradient for the 
case where 3j ^ i s.t. i — j — argmax ti, i.e., more than one stream have the same average 
transmission time. This extension will be key to proving the result in Theorem 1. 

Lemma 2. Define X = {i s.t. tj = maxj tj}. If {vf, i 6 X or i — 1 Gl} are jointly scaled to 
keep X fixed, then 

1) dPT/diii >OifieXandi-l£X 

2) dPT/dv t <Oifi^Xandi-leX 

Proof: See Appendix B. ■ 

Theorem 1 provides the optimal thresholding policy among streams and applies for any 
continuous loss visibility distribution obtained using kernel density estimation based on dHJ. 

Theorem 1. Thresholding Policy: The optimal loss visibility thresholds v* = {v*}f =2 satisfy 

F v (v; +1 ) - F v (Ht) = J ^;* Vi = 1, • • • , S (19) 
where fii = (1 - (1 - Pi). 

Proof: See Appendix C. ■ 
The solution is such that the post-retransmission throughput is equal among streams. Thus, the 
thresholds are selected to balance the load among spatial streams in proportion to the achievable 
throughput on each stream and the corresponding fraction of packets in each of the S classes. 
Correspondingly, the solution is referred to as the load balancing solution. Figure |2] shows a 
toy example of the steps taken to converge to the optimal solution for four packet classes 
(i.e., four spatial steams) where the loss visibility distribution is concentrated in three regions 
corresponding, for example, to I, P, and B frame packets. The process converges to the same 
solution for any initial non-identical thresholds. Based on the components of (fT9~l) . the load 
balancing property applies in three different aspects: 

1) Non-uniform loss visibility distribution: The loss visibility thresholds are selected to bal- 
ance the fraction of packets through each stream based on the loss visibility distribution. 
In Figure |2l for example, since the fraction of B frame packets is higher than P and / 
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Fig. 2. Toy example and sketch of the proof of the load balancing solution in Theorem 1. 



frames, %i — i)\ is made small enough to ensure that F v (v 2 ) — F v {y\) is comparable to 
F v (v i+ i) — F v {vi) for % > 1 so that the load is balanced among streams. 

2) Unequal modulation per stream: If the SNR on spatial stream i allows supporting a 
higher modulation order Mj, the fraction of packets through stream i, F v {v i+1 ) — F v (vi), 
is increased in proportion to log 2 M, according to ([T9T ). 

3) Retransmission overhead: If a particularly low SNR on spatial stream i incurs a large 
retransmission overhead n^, the fraction of packets through stream i, F v (v i+ i) — F v (vi), is 
reduced accordingly. 

Under the load balancing solution in Theorem 1, we have 



PT(v*,M,C, 



St log 2 



1=1 



Vi+1 



vf v (v)dv 



(20) 



Post— retx sum throughput Loss— penalized quality measure 

We note that for the special case of full retransmission, i.e., r = oo Vi, (1201 ) reduces to the sum 
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throughput as follows 

PT(v*, M, C, S) = ^ J2 - ^) lo ^ ( 21 ) 



PC 



C. MIMO Mode Selection and Link Adaptation 

Next, we propose adapting the selection of the modulation order per stream, the coding rate, and 
the MIMO mode. The objective of link adaptation is to maximize the sum throughput across 
streams. Given the post-processing SNR on stream i and the coding rate C, maximizing the 
sum throughput is equivalent to selecting the modulation order per stream to maximize the 
corresponding throughput per stream, that is, 

Mi(C) = argmax MieA/( {log 2 M 4 (l-p(M i ,C , ,7 4 (H)))}- (22) 

The coding rate is then selected to maximize the sum throughput given the corresponding optimal 
modulation order per stream as follows 

C* = argm a x CeC {C^log 2 M l (C , )(l-p(M l ,C, 7l (H)))}, 

i 

M* = Mi(C*). 
Substituting ({M*},C*) into dH, we obtain 



pt( V *, m*, c\ s) = -Lbc* t^4t i°g 2 m £ (i - P : +i ) P +1 

<■ J t=l 1 Pi i=l J Vi 



(23) 



Practical MIMO link adaptation should include a mechanism for switching the mode, i.e., number 
of spatial streams based on channel state matrix H to optimize system performance and provide a 
suitable diversity-multiplexing tradeoff. This allows a continuum of operating points that provide 
different data rate and reliability combinations from single stream beamforming to full spatial 
multiplexing. In this work, the MIMO mode selection criterion is intended to capture video 
quality as well as throughput. For instance, if the loss visibility distribution experiences higher 
variability, it may be preferable to use more streams to provide prioritized delivery by adding 
more packets classes if the channel quality is good. On the other hand, if the variability in 
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packet importance is low, then the contribution of packet prioritization is minimal and reliable 
delivery with a smaller number of spatial streams may be preferred. Thus, mode selection can 
adapt according to the video source in a content-aware manner. Consequently, the mode selection 
criterion is to maximize the perceived throughput expression, that is, 

S* = | argmax PT(v*, M, C, S) s.t. BC* 1 ~ f+i lo S2 M* > r\ . (24) 

I i=l ~ Pi J 

where R is the video source rate. The constraint BC* J2i=i -^"r+i log 2 M* > R ensures the 
throughput with the selected mode at least matches the rate of the video to ensure so the wireless 
link can serve the requirements the video source. 

D. Loss Visibility Optimized Video Transmission Algorithm 

In this section, we describe the proposed algorithm for loss visibility-optimized video transmis- 
sion over MIMO systems which involves selecting the optimal thresholding policy and the MCS 
per stream given the post-processing SNRs corresponding to the MIMO channel decomposition. 

The algorithmic description is provided in Algorithm 1. Given a certain number of spatial 
streams S, the corresponding precoder F5 is computed to maximize the minimum singular value. 
The corresponding post processing SNRs per stream are computed. The modulation orders are 
selected to maximize the per-stream throughput and the coding rate is selected to maximize the 
overall throughput. 

The streams are ordered according to the post-retransmission success probability. Given the 
modulation orders per stream and the loss visibility distribution, the optimal thresholding policy is 
computed according to Theorem 1 . This determines the values of the thresholds for transmission 
through each stream. After the process is repeated for each mode, the mode that maximizes the 
objective function and supports the video source rate is chosen according to (l24l . The block 
of packets corresponding to a channel coherence time are transmitted according to the selected 
MCSs, thresholding policy, and MIMO mode. Given the values of the incoming packets, the 
estimated loss visibility distribution is updated using kernel density estimation at each channel 
coherence time. 
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Algorithm 1 Loss Visibility Optimized Video Transmission over MIMO. 

Given channel state H 
for i = 1 -> S do 

Step 1. Precoder Computation 

Compute precoder Fs and post-processing SNRs 7(H) = {ji(H)}f =1 
Step 2. MCS Selection 
for C eC do 

Mi{C) = argmax MieM {log 2 Mi(l - p(M t , C, 7i(H)))} 
end for 

C* = argmaxcgJCE, log 2 M<(C)(1 - p(M i , C, 7i (H)))} 

m; = Mi(c*) 

Order streams according to post-retransmission success probability, i.e., pf < pf +1 Vi = 1, • • • , S — 1. 
Step 3. Loss Visibility Distribution Update 

Use kernel density estimation to update the loss visibility distribution f v (v) = K ^ "-"b-'] ^ 

Step 4. Thresholding Policy Selection 

Compute v* = {fi?}f=2 to satisfy F„(«| +1 ) - F„(«|) = ^0™^]/^ Vi = 1, • • • ,S 
end for 

Step 5. Mode Selection 

Select the optimal mode 5* according to d24t . 

V Video Quality and Throughput Gain Expressions 

To quantify the gains from using the loss visibility side information as proposed in Algorithm 
1, we compare with conventional MIMO signaling whereby no side information is used for 
packet prioritization. Instead, the symbols corresponding to each packet are mapped to all spatial 
streams. In this section, the notation PTlv refers to the perceived throughput expression under 
the loss visibility-based policy while PT$m refers to the perceived throughput expression under 
conventional non-prioritized MIMO signaling. The expression for PT L v under the loss visibility 
optimized policy is as stated in (|23T) . 

A. Gain Expressions with Full CSIT 

We consider a baseline whereby no prioritization is applied at the PHY layer, i.e., symbols 
corresponding to each packet are mapped to all spatial streams. We allow unequal modulations 
per stream, i.e., the modulation scheme on each stream maximizes the per-stream throughput. 
In this case, for a representative set of packets V, the cumulative value of packets received 
successfully is \V\(1 — p r+1 ) J Q vf v (v)dv where J Q vf v (v)dv is the average packet value and 
p is the corresponding PER using the non-prioritized transmission policy. We note that p = 
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1 — IlfLi (1 ~ Pi) l ^ S ■ Furthermore, the transmission time is maXj{E[6](l — p r+1 )/CB log 2 Mj(l — 
p)}\V\/S. Thus, the corresponding expression is 

PTsm = mis {t (1 - pr+1) I! vUv)dv ) m / n { c *r^k log2 Mt ] ■ (25) 

Now, we write the gain in perceived throughput G = E H [PTlv] /Eh [-P^sm] as follows 

G gnEtiMS E H [max c {C £V max A/t {log 2 M,/n t }}] 

E H [p s ^] Er^ max c {C'min i {max7y/ i {log 2 M;/n;}}}] 

V v ' V v ' 

Packet Prioritization Gain Gpp Load Balancing Gain Glb 

= GWxGW (26) 
where Vi = J^ i+1 vf v (v)dv, V = J* vf v (v)dv, and p s = 1 - p r+1 . 

The first component of (|26T ) is referred to as packet prioritization gain. It results from the fact 
that the more relevant packets are transmitted through the more reliable streams. Because streams 
are ordered by the post-retransmission success probability 1 — p[ +1 , the packet prioritization 
gain is always greater than 1. This provides another justification for the proposed ordering. We 
note that this gain is highest when both the packet values and the per-stream SNRs exhibit high 
variability. Furthermore, if infinite retransmissions are allowed on all streams, this gain converges 
to one since all packets are eventually received successfully. 

The second component of the expression is referred to as the load balancing gain. It cor- 
responds to the throughput averaged over spatial streams divided by the throughput on the 
worst spatial stream. It results from the fact that the optimal thresholding policy balances the 
load among streams in proportion to the throughput achievable on each stream. Conversely, in 
conventional transmission schemes, the performance achieved is limited by the performance on 
the worst stream. This justifies why the load balancing gain is the achievable throughput averaged 
over all streams divided by the achievable throughput on the worst stream. 

Furthermore, we note that the packet prioritization gain can be thought of as a reduction 
in loss visibility, i.e., a video quality gain whereas the load balancing gain is interpreted as a 
throughput gain. 
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B. Gain Expressions with Limited Feedback 

The expressions for PTg M and PT L v are in terms of the error probability pf , which in turn 
depends on the post processing SNR vector 7 = {7j(H)}f =1 . Thus, they apply equivalently under 
limited feedback given that 7«(H) is computed using © according to the selected precoder. We 
compute the gains under limited feedback by taking the expectation of the individual gains 
for each channel state given its mapping to the corresponding codeword. This corresponds to 
G = Ejr [PT LV ] /Ejr [PT SM ], i.e., 



E T [p s V] Ejr[S'maxc7{Cmin i {max Mi {log 2 M;/^}}}] 



V V 

Packet Prioritization Gain Gpp Load Balancing Gain Glb 

Gpp x G LB - (27) 



VI. Results and Analysis 

In this section, we first evaluate the proposed loss visibility based MIMO transmission poli- 
cies using H.264 encoded bit streams under different antenna configurations. Next, we present 
numerical results to quantify the load balancing and packet prioritization gains. Each entry of 
the channel matrix corresponds to a flat Rayleigh fading channel. The system bandwidth is 1 
MHz. The set of possible M-QAM constellations is M — {2, 4, 16, 64} corresponding to BPSK, 
4-QAM, 16-QAM, and 64-QAM. The set of possible coding rates is C = {1/2, 2/3, 3/4, 5/6}. 

A. Tests on H.264 Encoded Video Sequences 

We evaluate the video quality gain from the loss visibility based prioritization policy, we test 
the proposed algorithm on the Foreman video sequence [30] encoded with H.264/AVC. The GoP 
structure is IBPBP ■ ■ ■ and the GoP duration is 16 frames. The MB size is 16 x 16 and we 
use the CIF resolution of 352 x 288. The video frame is divided into horizontal slices where 
each slice is 22 MBs wide and 1 MB high. Thus, each frame corresponds to 18 slices and 
each slice is transmitted as one packet. The decoder uses motion copy error concealment. Loss 
visibility estimation is applied based on [1J as described in §II-B[ Figure [3] shows the resulting 
loss visibility scores for each frame/slice for the Foreman video sequence. Several observations 
are in order. 




Fig. 3. Loss visibility map of the Foreman video sequence encoded with H.264/AVC using a IBPBP GoP structure with 18 
horizontal slices per frame and a GoP duration of 16. 



1) Frame type: The variability of the visibility across frames is clear. For instance, the I 
frames can be noticed as dark red every GoP interval. Furthermore, the odd-numbered 
frames corresponding to P have higher loss visibility than the even-numbered B frames. 

2) Subject/background motion: Face motion between Frame 1 and Frame 170 cause high loss 
visibility for some slices depending on the spatial location of motion. Background motion 
between Frame 170 and Frame 220 contributes an overall increase in loss visibility. Beyond 
that, the lack of object and background motion causes an overall drop in loss visibility. 

3) Error propagation: For odd-numbered P frames, it can be noticed that the packet loss 
visibility captures the severity of potential error propagation by decaying for P frames 
towards the end of the GoP, i.e., close to the next reference frame. 

Figure |4] applies the loss visibility based prioritization policy to the Foreman video sequence 
[|30l for a 4 x 4 MIMO system, S = 3 streams/classes, and E s /N = 5 dB. The retransmission 
limit is r = 4 and the channel coherence time is equal to 1 GoP corresponding to a low mobility 



environment. Figure |4(a)| shows the mapping of each video packet to the corresponding spatial 
stream. Packets mapped to the best spatial stream are referred to as high priority packets and 



vice versa. The corresponding video quality is shown in Figure |4(c)| in comparison with the 
baseline, whereby the symbols corresponding to each packet are mapped to all spatial streams, 
for the same channel realization. Despite having 460 packet losses post-retransmission, the mean 
video quality with prioritization is 0.997 on the MS-SSIM scale whereas the mean video quality 
without prioritization is 0.802. With packet prioritization, losses affect only packets where error 
concealment can conceal the loss from being visible to the average viewer. In contrast, the error 
propagation effect is very severe in the case of no prioritization. The received and concealed 
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(a) Packet-Stream Mapping 





- No Prioritization 

- LV-based Prioritization 
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(b) Received frame 223 with and without prioritization 



(c) Comparison of video quality of the received videos 



Fig. 4. Case study of the loss visibility-based prioritization policy for the Foreman video sequence with 4x4 MIMO system, 
S = 3 streams, and E s /Nq = 5 dB. The retransmission limit is r = 4. 



frames with index 223 of the Foreman sequence are shown in Figure [4(b)] to further demonstrate 
the difference in video quality. 

Figure |5] demonstrates the video quality gains for a range of antenna configurations for the 
Foreman video sequence encoded with the same properties as previously described. The video 
quality at each data point is the frame-averaged quality further averaged over 10 different channel 
realizations. The same channel realizations are used for the two cases. The first observed trend is 
that for a fixed antenna configuration, the gains are maximized when S = mm(N t , N r ). This is 
because the large variability in the post-processing SNRs across streams enables more effective 
packet prioritization. Furthermore, increasing the number of antennas for a fixed number of 
streams improves video quality but reduces the video quality gain. The maximum gain is reported 
for a 2 x 2 setting where a video quality of 0.9 requires E s /N = 3 dB with prioritization versus 
E s /N = 20 dB without prioritization. Furthermore, gains in the excess of 10 dB are achieved 
over a range of antenna configurations. 
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Average SNR EJN 

Fig. 5. Comparison of the loss visibility-based thresholding policy vs. precoded MIMO transmission for H.264-encoded Foreman 
sequence for different antenna configurations over a range of SNRs. The retransmission limit is r — 4 and the channel coherence 
time is 1 GoP. 

B. Perceived Throughput Objective and the Load Balancing Policy 

First, we examine the perceived throughput objective and the optimal thresholding policy 
derived in Theorem 1 to obtain insight into the behavior of the loss visibility-based thresh- 
olding policy. In Figure [6l we show a plot of the perceived throughput expression (fl~3l) for a 
uniform loss visibility distribution vs. v 2 and £ 3 for a MIMO system with N t = N r = S = 
3, C = 3/4 and a packet drop threshold Vi = 0.1. The specific channel realization yields 
7(H) = [10.47 16.33 18.41] T dB resulting in an optimal per-stream constellation sizes of 
M* = [4 16 16] T . Note that the objective function is only plotted in the feasible operating 
region i>i < Vi+i. The peak of the objective occurs at v = [0.1 0.28 0.64] T . Note that for a 
uniform loss visibility distribution, we have F v {vj) = v^. To see that the result matches that 
in Theorem 1, we have 2(v 2 — Vi) = v 3 — v 2 = 1 — "03 = 0.36. Since the first spatial stream 
supports only 4-QAM, the load balancing solution ensures the fraction of packets transmitted 
through the spatial stream is half that transmitted through the second and the third. Note that, 
for this channel realization, « Vi. Thus, ~ 1 Vi and the retransmission overhead has 
only a minor affect on the optimal thresholding policy. 

In Figure [71 we plot the perceived throughput achieved by loss visibility-based thresholding 
policy and by precoded spatial multiplexing for a 4 x 4 MIMO system under different numbers 
of spatial streams. For beamforming (S = 1), the performance is equivalent since there is only a 
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Fig. 6. Perceived throughput PT vs. loss visibility thresholds v for JV t = iV r = 5" = 3, 7(H) = [10.47 16.33 18.41] T dB, 
M = [4 16 16] T , vi = 0.1 and C = 1/2. 
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(a) Loss visibility-based thresholding 

Fig. 7. Comparison of the perceived throughput achieved 
multiplexing for a 4 x 4 MIMO system for different number 
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(b) Precoding/Spatial Multiplexing 

' loss visibility-based thresholding and by precoded spatial 
streams. 



single packet class. For S > 1, we clearly notice the gains with loss visibility -based thresholding 
due to the load balancing property of the optimal thresholding policy. Since the load balancing 
gain is equal to the average throughput among divided by the throughput on the worst stream, 
the gains are most pronounced for S = 4. For S = 4 in the low SNR regime, the same 
perceived throughput is achieved at 12 dB lower transmit power. In the high SNR regime, to 
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Stream 1: Stream 3: 

4-QAM: 2.2% BPSK: 33.8% 
16-QAM: 97.8% 4-QAM: 66.2% 
Stream 2: Stream 4: 
4-QAM: 73.2% BPSK: 99.0% 
16-QAM: 26.8% 4-QAM: IX 
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(a) Different antenna configurations with S — 2 streams (b) Different number of streams for a 4 x 4 system 

Fig. 8. Load balancing gain Glb analysis. 

achieve PT = 7x 10 6 bps, spatial multiplexing requires 37 dB, whereas the optimal thresholding 
policy requires only 28 dB resulting a 9 dB reduction in transmit power. In terms of perceived 
throughput, at 20 dB, the gain is 4/1.8 = 2.22x and at 30 dB, the gain is 7.5/4 = 1.875x. 

For multimode precoding, we select the MIMO mode to maximize perceived throughput among 
all modes S — 1, ■ ■ • , min{iV f , N r }. Even comparing multimode precoding with the multimode 
thresholding policy where gains are expected to drop, we notice a 3 dB gain in the low SNR 
regime for PT = 2 x 10 6 bps and a 6 dB gain in the high SNR regime for PT = 8 x 10 6 bps. 



C. Load Balancing and Packet Prioritization Gains 

Having shown that significant gains are achieved by the loss visibility-based video transmission 
policies, we breakdown these gains by plotting the closed-form quality and throughput gain 
expressions derived in £jV] Recall that the throughput gain is achieved due to the load balancing 
property of the loss visibility-based thresholding policy, thus referred to as the load balancing 
gain. The quality gain manifests itself in a reduction in the average loss visibility of lost or 
dropped packets and results from the fact that the more visible packets are transmitted over the 
more reliable streams, thus referred to as the packet prioritization gain. 

In Figure [8l we analyze the load balancing gain Glb> defined in (l26l) . Figure |8(a)| shows 
the gain for S = 2 spatial streams with different antenna configurations. Recall from the load 
balancing gain expression that the gain is maximized when the per-stream throughputs exhibit the 
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10 1 

Fig. 9. Analysis of the effect of channel coherence time on the achievability of the load balancing throughput gain for the 
Foreman video sequence. 

highest variability among streams. In a two stream setup, this corresponds to the case where the 
difference between the throughput on the two steams is maximal. Thus, for S = 2, a 2 x 2 system 
gains more than a 4 x 4 system. In a 4 x 4 system with S = 2, the diversity and channel hardening 
reduce the gains from the proposed prioritization policy because the supported modulation orders 
per stream are equivalent for most channel realizations and the achievable throughput on the two 
streams is comparable. In Figure |8(b)[ we plot the load balancing gain for a 4 x 4 system for 
different numbers of spatial streams S. In the medium to high SNR regime, for the same N t x N r 
configuration, more streams provide higher gains versus non-video aware approaches since the 
condition number of the effective channel HF S is likely to be higher making it possible for 
video-aware techniques to make use of the diverse channel statistics among streams. For S = 2 
and S = 4, we show the fractional use of each modulation scheme at the peak operating points. 
For S — 2 at E s /N = — 1 dB, the best stream can support 4-QAM for most realizations while 
the worst stream can only support BPSK. A similar observation follows at 8 dB and 15 dB for 
16-QAM and 64-QAM. Conversely, at 4 dB (resp. 12 dB), both streams support 4-QAM (resp. 
16-QAM) for most channel realizations. Thus, the gain Glb is close to 1. 

Although the analysis applies to any channel coherence time larger than one packet, the 
underlying assumption in the load balancing property of Theorem 1 is that the packets observed 
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Fig. 10. (a) Loss visibility distribution of the Foreman video sequence; (b) Corresponding packet prioritization Gain Gpp vs. 
SNR for two packet classes (S=2 spatial streams) for r — with different antenna configurations. 

within a channel coherence time are representative of the loss visibility distribution. Otherwise, 
the observed short-term loss visibility distribution will be different from the distribution estimated 
using kernel density estimation causing a load balancing mismatch. It follows that the gains in 
Figure [8] are an upper bound that apply with a fairly large channel coherence time. For a more 
realistic analysis of the load balancing throughput gain, we simulate the proposed algorithm in 
Figure [9] with a variable channel coherence time ranging from S packets to several GoPs under 
2 antenna configurations. For 2x2 SM, the throughput always exceeds that of the baseline but 
the theoretical 2x load balancing gain reported in Figure |8]is only achieved if the channel is 
fairly static for few seconds. For a practical low mobility setup where the channel coherence is 
equal to one GoP, 1.5x out of the theoretical 2x gain is achieved. For 4x4 SM, the throughput 
exceeds that of the baseline when the channel is at least 35 ms equivalent to one video frame. 
Beyond that, for a channel coherence of one GoP, 1.25x gain is achieved. 

Next, we analyze the packet prioritization gain. As previously argued, under full retransmis- 
sion, the packet prioritization gain is equal to one since all packets are eventually delivered 
successfully and the notion of prioritization disappears. Thus, we consider the case of no 
retransmissions for analysis purposes. In practice, this provides an upper bound on the gain 
achieved with a finite retransmission policy whereby r < oo. Intuitively, we expect the packet 
prioritization gain to be larger in the low SNR regime because the packet error rate is higher, 
thus, under stringent delay requirements, reliable delivery is more costly to enforce. Figure [TO] 
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(a) Load Balancing Gain (b) Packet Prioritization Gain 

Fig. 11. Gains achieved with limited feedback for different codebook sizes and antenna configurations. 

shows the packet prioritization gain Gpp vs. E s /N for two packet classes, i.e., S=2 spatial 
streams. The corresponding loss visibility distribution as extracted from the Foreman sequence 
loss visibility model is shown in Figure [T0a . Comparing different antenna configurations, we 
notice that the gains are largest for 2 x 2 since the difference between the reliability of the two 
streams is largest. In contrast, for a 4 x 4 with 2 spatial streams, the diversity gain limits the 
amount of extra gain achieved due to packet prioritization. For a 2 x 2 system, up to 1.6x loss 
visibility reduction is achieved in the very low SNR regime and the gains diminish as the SNR 
increases. 

D. Gains under Limited Feedback 

Figure \TT\ shows the load balancing gains and the packet prioritization gain with limited 
feedback for different codebook sizes and antenna configurations. As expected, the gains increase 
as the codebook size increases as well as for larger number of spatial streams. With only 2 
spatial streams in a 4 x 2 antenna configuration and a 3 bit codebook, 27% throughput increase 
is achieved and 40% reduction in loss visibility. With 4 spatial streams in a 7 x 4 antenna 
configuration and a 4 bit codebook, 56% throughput increase is achieved and 61% reduction in 
loss visibility. 

The trends of the gain plots closely follows those in Figures [8] and [10] respectively. In terms 
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of the nominal gain values, we observe that codebook-based limited feedback has the following 
effects on the gains achievable from the loss visibility optimized policy. 

• The load balancing gain drops because the unequal stream quality cannot be fully utilized 
due to channel quantization errors. Such errors cause the gap between the post processing 
SNRs on the best and worst stream to tighten, thus reducing the achievable gain. 

• The packet prioritization gain increases because the quality of the spatial streams generally 
drops, making it possible to gain more from prioritizing the video packets among streams. 

VII. Conclusion 

We proposed a cross-layer architecture for prioritized packet delivery over a MIMO PHY 
layer based on loss visibility taking advantage of the large variability in loss visibility due to the 
video source and encoder features. We presented a loss visibility-based thresholding policy that 
maps different packets to different spatial streams and derived the optimal thresholding policy 
for any loss visibility distribution. The proposed architecture requires minimal additional cross- 
layer overhead while achieving quality and capacity gains. We demonstrated gains in the excess 
of 10 dB with different antenna configurations on H.264 encoded video sequences. 
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Appendix A 
Proof of Lemma 1 

Lemma 1. If the streams are ordered by the post-retransmission success probability, i.e., p* < 
Pi + i Vi = 2, • • • , N s , then the gradient dPT/ddi satisfies the following properties: 

1) dPT/dvj > where i = argmax ti 

2) dPT/ddi <0Wi^i 

Proof: First, we prove part 1 of the Lemma. From the expressions for dg/dvi and dh/dvi, it 
follows that 
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where © follows because £)f =1 (1 - p[ +1 )^ > (1— p? +1 )V|. Next, using the fact that pl+\ < 1, 
we further reduce the expression to 



dPT E[b]f v (v.)(l - pl +1 ) 
dv- ~ BClog 2 M-(l-p.) 



'i+i 



(1 - P-I +1 ) / vfv(v)dv ) - (1 - P ^ +l )vi{F v {v- i+x ) - F v {v$) 



E[b]f v (v-)(1 - p 



r+l^ 



x (1 - Pj +1 ) 



BClog 2 M ? (l-^) 
Finally, (|33l) follows because J^xf(x)dx > J^af(x)dx if a > 



vf v (v)dv - Vi(F v (v~ i+1 ) - F v (vj)) 



(3D 



(32) 
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dPT 



dv- 



h 2 > 



mum - p-i +1 ) 

BClog 2 Jlfj(l-ps) 








Thus, it follows that dPT/dv- > 0. 

We prove part 2 of Lemma 1 by investigating the terms of the gradient dPT/dvi = (hdg/dvi — 
gdh/dvi)/h 2 . We have h > and dg/ddi < unconditionally. Furthermore, dh/dvi > Vz 7^ z 
and a > 0. Thus, dPT/dvi <0\/i^l 



Lemma 2. Define X — {i s.t. tj = maxj 7^ {vf, i & X or i — 1 6 1} are jointly scaled to 
keep X fixed, then 

1) dPT/dvi > if z G X and z - 1 ^ X 

2) dPT/dvi < if i g X and i - 1 e X 

Proof: The special case of |X| = 1 is proved in Lemma 1. The case of |X| > 1 where the 
elements of |X| are non-consecutive also directly follows from Lemma 1 as one could jointly 
decrease {vi} Vz G X and increase {vi+i} Vz G X such that the set X is fixed. For the general case 
where some elements of X are consecutive, the set X can be divided into subsets of consecutive 
streams. For example, if X = {1,3,4}, the first subset is {1} and the second subset is {3,4}. 
Within each subset, dPT/dv,i > for the lower-most stream satisfying i G X and i — 1 ^ X by 
part 1 of Lemma 1 and dPT/dvi < for the upper-most stream satisfying i X and z — 1 G X 
by part 2 of Lemma 1. Thus, there exist an infinitesimal step e = {ei, • • • , e^} such that e» > 
if % G X and i-1 ^1 and ej < if % ^ X and i — 1 G X keeping X fixed and improving the 
objective and the result follows. 

Appendix C 
Proof of Theorem 1 

Theorem 1. Thresholding Policy: The optimal loss visibility thresholds v* = {v*}f=2 satisfy 



Appendix B 



Proof of Lemma 2 



(34) 
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where rij = (1 - (1 - Pi). 

Proof: We present a convergent method that takes as input any feasible solution and obtains a 
solution with an improved objective satisfying the condition stated above. Start with any feasible 
solution and define the initial set of streams with the longest average transmission time X = 
{i s.t. U = maxj tj}. Construct an infinitesimal step e = {ei, • • ■ , e$} such that e, > if i E X 
and i — 1 ^ X and < if i ^ X and i — 1 G X. By Lemma 2, there exist such an e such that X is 
unchanged and PT(v+e) > PT(v). Repeat until minjgxj^xjtj— tj} < 5 where 5 is an arbitrarily 
small positive number. This necessarily increases X Repopulate X according to the new {vi}. 
Repeat until X = {2, ■ • • ,5}. Thus, the optimal policy necessarily satisfies t\ — t 2 — ■ - ■ — t$, 
equivalently, (F v (v i+1 ) - F v (vi)) / (\og 2 M;(l - Pi )) = (F v (v 2 ) - F v (v x )) / (\og 2 Mi(l - pi)) Vi. 
By taking 1 = F v (v i+ i) — F v {vi), the Theorem follows. 
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